å¹ççãªã¹ããªãŒã å§çž®ãšè§£åã®ããã®Pythonã®gzipã¢ãžã¥ãŒã«ã®åãæ¢æ±ããŸããããŒã¿è»¢éãšã¹ãã¬ãŒãžãæé©åããããã®å®è·µçãªãã¯ããã¯ããã¹ããã©ã¯ãã£ã¹ãããã³åœéçãªãŠãŒã¹ã±ãŒã¹ãåŠã³ãŸãã
Python Gzipå§çž®ïŒã°ããŒãã«ã¢ããªã±ãŒã·ã§ã³ã®ããã®ã¹ããªãŒã å§çž®ãšè§£åã®ç¿åŸ
仿¥ã®ããŒã¿é§ååã®äžçã§ã¯ãå¹ççãªããŒã¿åŠçãæãéèŠã§ãã倧éžãè¶ããŠæ©å¯æ
å ±ãéä¿¡ããå Žåã§ããèšå€§ãªããŒã¿ã»ãããã¢ãŒã«ã€ãããå Žåã§ããã¢ããªã±ãŒã·ã§ã³ã®ããã©ãŒãã³ã¹ãæé©åããå Žåã§ããå§çž®ã¯éèŠãªåœ¹å²ãæãããŸããPythonã¯ããã®è±å¯ãªæšæºã©ã€ãã©ãªã«ãããgzip
ã¢ãžã¥ãŒã«ãéããŠå§çž®ããŒã¿ãåŠçããããã®åŒ·åã§ç°¡åãªãœãªã¥ãŒã·ã§ã³ãæäŸããŸãããã®èšäºã§ã¯ãPythonã®gzip
ã¢ãžã¥ãŒã«ãæ·±ãæãäžããã¹ããªãŒã å§çž®ãšè§£åã«çŠç¹ãåœãŠãå®è·µçãªäŸãæäŸããã°ããŒãã«ã¢ããªã±ãŒã·ã§ã³ã«ããããã®éèŠæ§ã匷調ããŸãã
Gzipå§çž®ã®çè§£
Gzipã¯ããã¹ã¬ã¹ããŒã¿å§çž®ã«äœ¿çšãããåºãæ¡çšãããŠãããã¡ã€ã«åœ¢åŒããã³ãœãããŠã§ã¢ã¢ããªã±ãŒã·ã§ã³ã§ããJean-Loup GaillyãšMark Adlerã«ãã£ãŠéçºãããLZ77ã¢ã«ãŽãªãºã ãšãããã³ç¬Šå·åã®çµã¿åããã§ããDEFLATEã¢ã«ãŽãªãºã ã«åºã¥ããŠããŸããgzipã®äž»ãªç®æšã¯ããã¡ã€ã«ã®ãµã€ãºãçž®å°ããããã«ãã£ãŠã¹ãã¬ãŒãžã¹ããŒã¹ãæå°éã«æãããããã¯ãŒã¯çµç±ã®ããŒã¿éä¿¡ãé«éåããããšã§ãã
Gzipã®äž»ãªç¹åŸŽïŒ
- ãã¹ã¬ã¹å§çž®ïŒGzipã¯ãå§çž®ããã³è§£åããã»ã¹äžã«ããŒã¿ã倱ãããªãããšãä¿èšŒããŸããå ã®ããŒã¿ã¯ãå§çž®ããŒãžã§ã³ããå®å šã«åæ§ç¯ã§ããŸãã
- ãŠããã¿ã¹ãµããŒãïŒGzipã¯ãã»ãšãã©ã®Unixã©ã€ã¯ãªãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã§ã®æšæºã§ãããå€ãã®WebãµãŒããŒãšãã©ãŠã¶ãŒã§ãã€ãã£ãã«ãµããŒããããŠãããããWebã³ã³ãã³ãé ä¿¡ã«æé©ã§ãã
- ã¹ããªãŒã æåïŒGzipã¯ããŒã¿ã¹ããªãŒã ã§åäœããããã«èšèšãããŠããŸããã€ãŸããããŒã¿ã»ããå šäœãã¡ã¢ãªãŒã«ããŒãããªããŠããããŒã¿ã®èªã¿åããŸãã¯æžãèŸŒã¿æã«ããŒã¿ãå§çž®ãŸãã¯è§£åã§ããŸããããã¯ã倧ããªãã¡ã€ã«ãŸãã¯ãªã¢ã«ã¿ã€ã ããŒã¿åŠçã«ç¹ã«åœ¹ç«ã¡ãŸãã
Pythonã®gzip
ã¢ãžã¥ãŒã«ïŒæŠèŠ
Pythonã®çµã¿èŸŒã¿gzip
ã¢ãžã¥ãŒã«ã¯ãGzip圢åŒã䜿çšããŠãã¡ã€ã«ãå§çž®ããã³è§£åããããã®äŸ¿å©ãªã€ã³ã¿ãŒãã§ã€ã¹ãæäŸããŸããGNU zipã¢ããªã±ãŒã·ã§ã³ãšã®äºææ§ãæã€ããã«èšèšãããŠãããPythonã®æšæºãã¡ã€ã«åŠçã«ãããã®ãšåæ§ã®æ©èœãæäŸããŸããããã«ãããéçºè
ã¯å§çž®ãã¡ã€ã«ãéåžžã®ãã¡ã€ã«ã®ããã«æ±ãããšãã§ããã¢ããªã±ãŒã·ã§ã³ãžã®å§çž®ã®çµ±åãç°¡çŽ åãããŸãã
gzip
ã¢ãžã¥ãŒã«ã¯ãããã€ãã®äž»èŠãªã¯ã©ã¹ãšé¢æ°ãæäŸããŸãïŒ
gzip.GzipFile
ïŒãã®ã¯ã©ã¹ã¯ããã¡ã€ã«ãªããžã§ã¯ããšåæ§ã®ã€ã³ã¿ãŒãã§ã€ã¹ãæäŸããgzipå§çž®ãã¡ã€ã«ããã®èªã¿åãããã³æžã蟌ã¿ãå¯èœã«ããŸããgzip.open()
ïŒPythonã®çµã¿èŸŒã¿open()
颿°ãšåæ§ã«ãgzipå§çž®ãã¡ã€ã«ããã€ããªãŒãŸãã¯ããã¹ãã¢ãŒãã§éã䟿å©ãªé¢æ°ãgzip.compress()
ïŒãã€ãæååãå§çž®ããåçŽãªé¢æ°ãgzip.decompress()
ïŒgzipå§çž®ããããã€ãæååãè§£åããåçŽãªé¢æ°ã
gzip.GzipFile
ã䜿çšããã¹ããªãŒã å§çž®
gzip
ã¢ãžã¥ãŒã«ã®åã¯ãããŒã¿ã¹ããªãŒã ãæ±ããšãã«çã«çºæ®ãããŸããããã¯ããã®ã³ã°ãããŒã¿ããã¯ã¢ããããããã¯ãŒã¯éä¿¡ãªã©ã倧éã®ããŒã¿ãåŠçããã¢ããªã±ãŒã·ã§ã³ã«ç¹ã«é¢é£ããŸããgzip.GzipFile
ã䜿çšãããšãããŒã¿ãçæããããå¥ã®ãœãŒã¹ããèªã¿åããããšãã«ããªã³ã¶ãã©ã€ã§ããŒã¿ãå§çž®ã§ããŸãã
ãã¡ã€ã«ãžã®ããŒã¿ã®å§çž®
åºæ¬çãªäŸããå§ããŸãããïŒæååã.gz
ãã¡ã€ã«ã«å§çž®ããŸããæžã蟌ã¿ãã€ããªãŒã¢ãŒãïŒ'wb'
ïŒã§GzipFile
ãªããžã§ã¯ããéããŸãã
import gzip
import os
data_to_compress = b"This is a sample string that will be compressed using Python's gzip module. It's important to use bytes for compression."
file_name = "compressed_data.gz"
# Open the gzip file in write binary mode
with gzip.GzipFile(file_name, 'wb') as gz_file:
gz_file.write(data_to_compress)
print(f"Data successfully compressed to {file_name}")
# Verify file size (optional)
print(f"Original data size: {len(data_to_compress)} bytes")
print(f"Compressed file size: {os.path.getsize(file_name)} bytes")
ãã®äŸã§ã¯ïŒ
gzip
ã¢ãžã¥ãŒã«ãã€ã³ããŒãããŸãã- å§çž®ããããŒã¿ããã€ãæååïŒ
b"..."
ïŒãšããŠå®çŸ©ããŸããGzipã¯æååã§ã¯ãªãããã€ãã§åäœããŸãã - éåžž
.gz
æ¡åŒµåãæã€åºåãã¡ã€ã«åãæå®ããŸãã with
ã¹ããŒãã¡ã³ãã䜿çšããŠããšã©ãŒãçºçããå Žåã§ãGzipFile
ãé©åã«éããããããã«ããŸããgz_file.write(data_to_compress)
ã¯ãå§çž®ãããããŒã¿ããã¡ã€ã«ã«æžã蟌ã¿ãŸãã
å§çž®ããããã¡ã€ã«ãµã€ãºãå ã®ããŒã¿ãµã€ãºãããå€§å¹ ã«å°ãããªã£ãŠããããšã«æ°ä»ãã§ããããããã¯ãgzipå§çž®ã®æå¹æ§ã瀺ããŠããŸãã
æ¢åã®ã¹ããªãŒã ããã®ããŒã¿ã®å§çž®
ããäžè¬çãªãŠãŒã¹ã±ãŒã¹ã§ã¯ãéåžžã®ãã¡ã€ã«ããããã¯ãŒã¯ãœã±ãããªã©ãå¥ã®ãœãŒã¹ããã®ããŒã¿ãå§çž®ããŸããgzip
ã¢ãžã¥ãŒã«ã¯ããããã®ã¹ããªãŒã ãšã·ãŒã ã¬ã¹ã«çµ±åãããŸãã
倧ããªããã¹ããã¡ã€ã«ïŒäŸïŒlarge_log.txt
ïŒãããããã¡ã€ã«å
šäœãã¡ã¢ãªãŒã«ããŒãããã«ãªã¢ã«ã¿ã€ã ã§å§çž®ãããšããŸãã
import gzip
input_file_path = "large_log.txt"
output_file_path = "large_log.txt.gz"
# Assume large_log.txt exists and contains a lot of text
# For demonstration, let's create a dummy large file:
with open(input_file_path, "w") as f:
for i in range(100000):
f.write(f"This is line number {i+1}. Some repetitive text for compression. \n")
print(f"Created dummy input file: {input_file_path}")
try:
# Open the input file in read text mode
with open(input_file_path, 'rb') as f_in:
# Open the output gzip file in write binary mode
with gzip.GzipFile(output_file_path, 'wb') as f_out:
# Read data in chunks and write to the gzip file
while True:
chunk = f_in.read(4096) # Read in 4KB chunks
if not chunk:
break
f_out.write(chunk)
print(f"Successfully compressed {input_file_path} to {output_file_path}")
except FileNotFoundError:
print(f"Error: Input file {input_file_path} not found.")
except Exception as e:
print(f"An error occurred: {e}")
ããã«ïŒ
- å
¥åãã¡ã€ã«ããã€ããªãŒã¢ãŒãïŒ
'rb'
ïŒã§èªã¿åãããã€ããæ³å®ããgzipãšã®äºææ§ã確ä¿ããŸãã gzip.GzipFile
ã«ãã€ããªãŒã¢ãŒãïŒ'wb'
ïŒã§æžã蟌ã¿ãŸãã- ãã£ã³ã¯ã¡ã«ããºã ïŒ
f_in.read(4096)
ïŒã䜿çšããŠãããŒã¿ãå°ããã€èªã¿æžãããŸããããã¯ã倧ããªãã¡ã€ã«ãå¹ççã«åŠçããã¡ã¢ãªãŒã®æ¯æžãé²ãããã«éèŠã§ãã4096ãã€ãïŒ4KBïŒã®ãã£ã³ã¯ãµã€ãºã¯ãäžè¬çã§å¹æçãªéžæè¢ã§ãã
ãã®ã¹ããªãŒãã³ã°ã¢ãããŒãã¯éåžžã«ã¹ã±ãŒã©ãã«ã§ãã¡ã¢ãªãŒã«åãŸããªãå¯èœæ§ã®ããå€§èŠæš¡ãªããŒã¿ã»ããã®åŠçã«é©ããŠããŸãã
ãããã¯ãŒã¯ãœã±ãããžã®ããŒã¿ã®å§çž®
ãããã¯ãŒã¯ã¢ããªã±ãŒã·ã§ã³ã§ã¯ã垯åå¹ ã®å¶éãšã¬ã€ãã³ã·ãŒã®å¢å ã«ãããå§çž®ãããŠããªãããŒã¿ãéä¿¡ãããšéå¹çã«ãªãå¯èœæ§ããããŸããGzipå§çž®ã¯ãããã©ãŒãã³ã¹ãå€§å¹ ã«åäžãããããšãã§ããŸãããµãŒããŒããã¯ã©ã€ã¢ã³ãã«ããŒã¿ãéä¿¡ããããšãæ³åããŠãã ããããœã±ãããä»ããŠéä¿¡ããçŽåã«ããŒã¿ãå§çž®ã§ããŸãã
ãã®äŸã¯ãã¢ãã¯ãœã±ããã䜿çšããæŠå¿µã瀺ããŠããŸããå®éã®ã¢ããªã±ãŒã·ã§ã³ã§ã¯ãsocket
ã®ãããªã©ã€ãã©ãªãŸãã¯Flask/Djangoã®ãããªãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠãå®éã®ãããã¯ãŒã¯ãœã±ãããšå¯Ÿè©±ããŸãã
import gzip
import io
def compress_and_send(data_stream, socket):
# Create an in-memory binary stream (like a file)
compressed_stream = io.BytesIO()
# Wrap the in-memory stream with gzip.GzipFile
with gzip.GzipFile(fileobj=compressed_stream, mode='wb') as gz_writer:
# Write data from the input stream to the gzip writer
while True:
chunk = data_stream.read(4096) # Read in chunks
if not chunk:
break
gz_writer.write(chunk)
# Get the compressed bytes from the in-memory stream
compressed_data = compressed_stream.getvalue()
# In a real scenario, you would send compressed_data over the socket
print(f"Sending {len(compressed_data)} bytes of compressed data over socket...")
# socket.sendall(compressed_data) # Example: send over actual socket
# --- Mock setup for demonstration ---
# Simulate data coming from a source (e.g., a file or database query)
original_data_source = io.BytesIO(b"This is some data to be sent over the network. " * 10000)
# Mock socket object
class MockSocket:
def sendall(self, data):
print(f"Mock socket received {len(data)} bytes.")
mock_socket = MockSocket()
print("Starting compression and mock send...")
compress_and_send(original_data_source, mock_socket)
print("Mock send complete.")
ãã®ã·ããªãªã§ã¯ïŒ
io.BytesIO
ã䜿çšããŠããã¡ã€ã«ã®ããã«æ©èœããã€ã³ã¡ã¢ãªãŒãã€ããªãŒã¹ããªãŒã ãäœæããŸããfileobj
åŒæ°ã䜿çšããŠããã®ã¹ããªãŒã ãgzip.GzipFile
ã«æž¡ããŸããgzip.GzipFile
ã¯ãå§çž®ãããããŒã¿ãio.BytesIO
ãªããžã§ã¯ãã«æžã蟌ã¿ãŸãã- æåŸã«ã
compressed_stream.getvalue()
ã䜿çšããŠå§çž®ããããã€ããååŸããããããå®éã®ãããã¯ãŒã¯ãœã±ãããä»ããŠéä¿¡ããŸãã
ãã®ãã¿ãŒã³ã¯ãWebãµãŒããŒïŒHTTPã¬ãã«ã§åŠçããNginxãApacheãªã©ïŒããã³ã«ã¹ã¿ã ãããã¯ãŒã¯ãããã³ã«ã§Gzipå§çž®ãå®è£ ããããã®åºæ¬ã§ãã
gzip.GzipFile
ã䜿çšããã¹ããªãŒã è§£å
å§çž®ãäžå¯æ¬ ã§ããããã«ãè§£åãåæ§ã§ããgzip
ã¢ãžã¥ãŒã«ã¯ãã¹ããªãŒã ããããŒã¿ãè§£åããããã®ç°¡åãªã¡ãœãããæäŸããŸãã
ãã¡ã€ã«ããã®ããŒã¿ã®è§£å
.gz
ãã¡ã€ã«ããããŒã¿ãèªã¿åãã«ã¯ãèªã¿åããã€ããªãŒã¢ãŒãïŒ'rb'
ïŒã§GzipFile
ãªããžã§ã¯ããéããŸãã
import gzip
import os
# Assuming 'compressed_data.gz' was created in the previous example
file_name = "compressed_data.gz"
if os.path.exists(file_name):
try:
# Open the gzip file in read binary mode
with gzip.GzipFile(file_name, 'rb') as gz_file:
decompressed_data = gz_file.read()
print(f"Data successfully decompressed from {file_name}")
print(f"Decompressed data: {decompressed_data.decode('utf-8')}") # Decode to string for display
except FileNotFoundError:
print(f"Error: File {file_name} not found.")
except gzip.BadGzipFile:
print(f"Error: File {file_name} is not a valid gzip file.")
except Exception as e:
print(f"An error occurred during decompression: {e}")
else:
print(f"Error: File {file_name} does not exist. Please run the compression example first.")
ããŒãã€ã³ãïŒ
'rb'
ã§éããšãããŒã¿ãèªã¿åããšãã«ãªã³ã¶ãã©ã€ã§è§£åããå¿ èŠãããå§çž®ãã¡ã€ã«ãšããŠæ±ãããã«Pythonã«æç€ºããŸããgz_file.read()
ã¯ãè§£åãããã³ã³ãã³ãå šäœãèªã¿åããŸããéåžžã«å€§ããªãã¡ã€ã«ã®å Žåã¯ãå床ãã£ã³ã¯ã䜿çšããŸãïŒwhile chunk := gz_file.read(4096): ...
ã- å ã®ããŒã¿ãUTF-8ãšã³ã³ãŒããããããã¹ãã§ãããšä»®å®ããŠãçµæã®ãã€ããUTF-8æååã«ãã³ãŒãããŠè¡šç€ºããŸãã
æ¢åã®ã¹ããªãŒã ãžã®ããŒã¿ã®è§£å
å§çž®ãšåæ§ã«ãgzipã¹ããªãŒã ããããŒã¿ãè§£åããéåžžã®ãã¡ã€ã«ããããã¯ãŒã¯ãœã±ãããªã©ãå¥ã®å®å ã«æžã蟌ãããšãã§ããŸãã
import gzip
import io
import os
# Create a dummy compressed file for demonstration
original_content = b"Decompression test. This content will be compressed and then decompressed. " * 5000
compressed_file_for_decomp = "temp_compressed_for_decomp.gz"
with gzip.GzipFile(compressed_file_for_decomp, 'wb') as f_out:
f_out.write(original_content)
print(f"Created dummy compressed file: {compressed_file_for_decomp}")
output_file_path = "decompressed_output.txt"
try:
# Open the input gzip file in read binary mode
with gzip.GzipFile(compressed_file_for_decomp, 'rb') as f_in:
# Open the output file in write binary mode
with open(output_file_path, 'wb') as f_out:
# Read compressed data in chunks and write decompressed data
while True:
chunk = f_in.read(4096) # Reads decompressed data in chunks
if not chunk:
break
f_out.write(chunk)
print(f"Successfully decompressed {compressed_file_for_decomp} to {output_file_path}")
# Optional: Verify content integrity (for demonstration)
with open(output_file_path, 'rb') as f_verify:
read_content = f_verify.read()
if read_content == original_content:
print("Content verification successful: Decompressed data matches original.")
else:
print("Content verification failed: Decompressed data does NOT match original.")
except FileNotFoundError:
print(f"Error: Input file {compressed_file_for_decomp} not found.")
except gzip.BadGzipFile:
print(f"Error: Input file {compressed_file_for_decomp} is not a valid gzip file.")
except Exception as e:
print(f"An error occurred during decompression: {e}")
finally:
# Clean up dummy files
if os.path.exists(compressed_file_for_decomp):
os.remove(compressed_file_for_decomp)
if os.path.exists(output_file_path):
# os.remove(output_file_path) # Uncomment to remove the output file as well
pass
ãã®ã¹ããªãŒãã³ã°è§£åã§ã¯ïŒ
gzip.GzipFile(..., 'rb')
ã䜿çšããŠããœãŒã¹.gz
ãã¡ã€ã«ãéããŸãã- æžã蟌ã¿ãã€ããªãŒã¢ãŒãïŒ
'wb'
ïŒã§å®å ãã¡ã€ã«ïŒoutput_file_path
ïŒãéããŸãã f_in.read(4096)
åŒã³åºãã¯ãgzipã¹ããªãŒã ããæå€§4096ãã€ãã®*è§£åããã*ããŒã¿ãèªã¿åããŸãã- ãã®è§£åããããã£ã³ã¯ã¯ãåºåãã¡ã€ã«ã«æžã蟌ãŸããŸãã
ãããã¯ãŒã¯ãœã±ããããã®ããŒã¿ã®è§£å
Gzipã§å§çž®ãããããšãäºæ³ããããããã¯ãŒã¯ãä»ããŠããŒã¿ãåä¿¡ããå ŽåãããŒã¿ãå°çãããšè§£åã§ããŸãã
import gzip
import io
def decompress_and_process(socket_stream):
# Create an in-memory binary stream to hold compressed data
compressed_buffer = io.BytesIO()
# Read data from the socket in chunks and append to the buffer
# In a real app, this loop would continue until connection closes or EOF
print("Receiving compressed data...")
bytes_received = 0
while True:
try:
# Simulate receiving data from socket. Replace with actual socket.recv()
# For demo, let's generate some compressed data to simulate receipt
if bytes_received == 0: # First chunk
# Simulate sending a small compressed message
original_msg = b"Hello from the compressed stream! " * 50
buffer_for_compression = io.BytesIO()
with gzip.GzipFile(fileobj=buffer_for_compression, mode='wb') as gz_writer:
gz_writer.write(original_msg)
chunk_to_receive = buffer_for_compression.getvalue()
else:
chunk_to_receive = b""
if not chunk_to_receive:
print("No more data from socket.")
break
compressed_buffer.write(chunk_to_receive)
bytes_received += len(chunk_to_receive)
print(f"Received {len(chunk_to_receive)} bytes. Total received: {bytes_received}")
# In a real app, you might process partially if you have delimiters
# or know the expected size, but for simplicity here, we'll process after receiving all.
except Exception as e:
print(f"Error receiving data: {e}")
break
print("Finished receiving. Starting decompression...")
compressed_buffer.seek(0) # Rewind the buffer to read from the beginning
try:
# Wrap the buffer with gzip.GzipFile for decompression
with gzip.GzipFile(fileobj=compressed_buffer, mode='rb') as gz_reader:
# Read decompressed data
decompressed_data = gz_reader.read()
print("Decompression successful.")
print(f"Decompressed data: {decompressed_data.decode('utf-8')}")
# Process the decompressed_data here...
except gzip.BadGzipFile:
print("Error: Received data is not a valid gzip file.")
except Exception as e:
print(f"An error occurred during decompression: {e}")
# --- Mock setup for demonstration ---
# In a real scenario, 'socket_stream' would be a connected socket object
# For this demo, we'll pass our BytesIO buffer which simulates received data
# Simulate a socket stream that has received some compressed data
# (This part is tricky to mock perfectly without a full socket simulation,
# so the function itself simulates receiving and then processes)
decompress_and_process(None) # Pass None as the actual socket object is mocked internally for demo
ããã§ã®æŠç¥ã¯æ¬¡ã®ãšããã§ãïŒ
- ãããã¯ãŒã¯ãœã±ããããããŒã¿ãåä¿¡ããã€ã³ã¡ã¢ãªãŒãããã¡ãŒïŒ
io.BytesIO
ïŒã«ä¿åããŸãã - äºæ³ããããã¹ãŠã®ããŒã¿ãåä¿¡ãããïŒãŸãã¯æ¥ç¶ãéãããããïŒããããã¡ãŒãå·»ãæ»ããŸãã
- èªã¿åããã€ããªãŒã¢ãŒãïŒ
'rb'
ïŒã§gzip.GzipFile
ã䜿çšããŠãããã¡ãŒãã©ããããŸãã - ãã®ã©ãããŒããè§£åãããããŒã¿ãèªã¿åããŸãã
泚ïŒãªã¢ã«ã¿ã€ã ã¹ããªãŒãã³ã°ã§ã¯ãããŒã¿ãå°çãããšè§£åããå¯èœæ§ããããŸãããããã«ã¯ãäžå®å šãªgzipãããã¯ãè§£åããããšããªãããã«ãããè€éãªãããã¡ãªã³ã°ãšåŠçãå¿ èŠã§ãã
gzip.open()
ã䜿çšããç°¡çŽ å
å€ãã®äžè¬çãªã·ããªãªãç¹ã«ãã¡ã€ã«ãçŽæ¥æ±ãå Žåãgzip.open()
ã¯ãPythonã®çµã¿èŸŒã¿open()
ãšéåžžã«ãã䌌ããããç°¡æœãªæ§æãæäŸããŸãã
gzip.open()
ã䜿çšããæžã蟌ã¿ïŒå§çž®ïŒ
import gzip
output_filename = "simple_compressed.txt.gz"
content_to_write = "This is a simple text file being compressed using gzip.open().\n"
try:
# Open in text write mode ('wt') for automatic encoding/decoding
with gzip.open(output_filename, 'wt', encoding='utf-8') as f:
f.write(content_to_write)
f.write("Another line of text.")
print(f"Successfully wrote compressed data to {output_filename}")
except Exception as e:
print(f"An error occurred: {e}")
GzipFile
ãšã®äž»ãªéãïŒ
- ããã¹ãã¢ãŒãïŒ
'wt'
ïŒã§éããencoding
ãæå®ããŠãæååãç°¡åã«æäœã§ããŸãã - åºã«ãªãå§çž®ã¯èªåçã«åŠçãããŸãã
gzip.open()
ã䜿çšããèªã¿åãïŒè§£åïŒ
import gzip
import os
input_filename = "simple_compressed.txt.gz"
if os.path.exists(input_filename):
try:
# Open in text read mode ('rt') for automatic decoding
with gzip.open(input_filename, 'rt', encoding='utf-8') as f:
read_content = f.read()
print(f"Successfully read decompressed data from {input_filename}")
print(f"Content: {read_content}")
except FileNotFoundError:
print(f"Error: File {input_filename} not found.")
except gzip.BadGzipFile:
print(f"Error: File {input_filename} is not a valid gzip file.")
except Exception as e:
print(f"An error occurred: {e}")
else:
print(f"Error: File {input_filename} does not exist. Please run the writing example first.")
finally:
# Clean up the created file
if os.path.exists(input_filename):
os.remove(input_filename)
'rt'
ã䜿çšãããšãPythonãUTF-8ãã³ãŒããåŠçããŠãæååãšããŠçŽæ¥èªã¿åãããšãã§ããŸãã
ãã€ãæååã®gzip.compress()
ãšgzip.decompress()
ãã¡ã€ã«ãã¹ããªãŒã ãæ±ããã«ãã¡ã¢ãªãŒã«ãã€ãæååããããå§çž®ãŸãã¯è§£åããå Žåã¯ãgzip.compress()
ãšgzip.decompress()
ãçæ³çã§ãã
import gzip
original_bytes = b"This is a short string that will be compressed and decompressed in memory."
# Compress
compressed_bytes = gzip.compress(original_bytes)
print(f"Original size: {len(original_bytes)} bytes")
print(f"Compressed size: {len(compressed_bytes)} bytes")
# Decompress
decompressed_bytes = gzip.decompress(compressed_bytes)
print(f"Decompressed size: {len(decompressed_bytes)} bytes")
# Verify
print(f"Original equals decompressed: {original_bytes == decompressed_bytes}")
print(f"Decompressed content: {decompressed_bytes.decode('utf-8')}")
ãããã®é¢æ°ã¯ãã¡ã¢ãªãŒå ã®å°ããªããŒã¿ã®ãã£ã³ã¯ãå§çž®/è§£åããæãç°¡åãªæ¹æ³ã§ããã¡ã¢ãªãŒã®åé¡ãåŒãèµ·ããå¯èœæ§ã®ããéåžžã«å€§ããªããŒã¿ã«ã¯é©ããŠããŸããã
é«åºŠãªãªãã·ã§ã³ãšèæ ®äºé
gzip.GzipFile
ã³ã³ã¹ãã©ã¯ã¿ãŒãšgzip.open()
ã¯ãå§çž®ãšãã¡ã€ã«åŠçã«åœ±é¿ãäžããå¯èœæ§ã®ãã远å ã®ãã©ã¡ãŒã¿ãŒãåãå
¥ããŸãïŒ
compresslevel
ïŒå§çž®ã¬ãã«ãå¶åŸ¡ãã0ã9ã®æŽæ°ã0
ã¯å§çž®ããªãããšãæå³ãã9
ã¯æãé ããæã广çãªå§çž®ãæå³ããŸããããã©ã«ãã¯éåžž9
ã§ããmtime
ïŒgzipãã¡ã€ã«ããããŒã«æ ŒçŽãããŠãã倿޿éãå¶åŸ¡ããŸããNone
ã«èšå®ãããšãçŸåšã®æéã䜿çšãããŸããfilename
ïŒäžéšã®ãŠãŒãã£ãªãã£ã«åœ¹ç«ã€ãå ã®ãã¡ã€ã«åãgzipããããŒã«æ ŒçŽã§ããŸããfileobj
ïŒæ¢åã®ãã¡ã€ã«ã®ãããªãªããžã§ã¯ããã©ããããããã«äœ¿çšãããŸããmode
ïŒèª¬æããããã«ãèªã¿åã/è§£åã«ã¯'rb'
ãæžã蟌ã¿/å§çž®ã«ã¯'wb'
ãgzip.open()
ã䜿çšããããã¹ãã¢ãŒãã«ã¯'rt'
ãš'wt'
ãencoding
ïŒæååããã€ãã«å€æããæ¹æ³ãšãã€ãããæååã«å€æããæ¹æ³ãæå®ããããã«ãgzip.open()
ã§ããã¹ãã¢ãŒãïŒ'rt'
ã'wt'
ïŒã䜿çšããå Žåã«éèŠã§ãã
é©åãªå§çž®ã¬ãã«ã®éžæ
compresslevel
ãã©ã¡ãŒã¿ãŒïŒ0ã9ïŒã¯ãé床ãšãã¡ã€ã«ãµã€ãºã®çž®å°ã®éã®ãã¬ãŒããªããæäŸããŸãïŒ
- ã¬ãã«0ã3ïŒé«éå§çž®ããµã€ãºçž®å°ãå°ãªããé床ãéèŠã§ããã¡ã€ã«ãµã€ãºãããã»ã©éèŠã§ãªãå Žåã«é©ããŠããŸãã
- ã¬ãã«4ã6ïŒãã©ã³ã¹ã®åããã¢ãããŒãã劥åœãªé床ã§è¯å¥œãªå§çž®ã
- ã¬ãã«7ã9ïŒäœéå§çž®ãæå€§ãµã€ãºçž®å°ãã¹ãã¬ãŒãžã¹ããŒã¹ãéãããŠãããã垯åå¹ ãéåžžã«é«äŸ¡ã§ãå§çž®æéãããã«ããã¯ã«ãªããªãå Žåã«çæ³çã§ãã
ã»ãšãã©ã®æ±çšã¢ããªã±ãŒã·ã§ã³ã§ã¯ãããã©ã«ãïŒã¬ãã«9ïŒãé©åãªå ŽåããããããŸãããã ããããã©ãŒãã³ã¹ãéèŠãªã·ããªãªïŒWebãµãŒããŒã®ãªã¢ã«ã¿ã€ã ããŒã¿ã¹ããªãŒãã³ã°ãªã©ïŒã§ã¯ãããäœãã¬ãã«ã詊ãããšãæçãªå ŽåããããŸãã
ãšã©ãŒåŠçïŒBadGzipFile
æœåšçãªãšã©ãŒãåŠçããããšãäžå¯æ¬ ã§ããç Žæãããã¡ã€ã«ãŸãã¯gzipãã¡ã€ã«ä»¥å€ãæ±ããšãã«çºçããæãäžè¬çãªäŸå€ã¯ãgzip.BadGzipFile
ã§ããgzipæäœã¯åžžã«try...except
ãããã¯ã§ã©ããããŠãã ããã
ä»ã®Gzipå®è£ ãšã®äºææ§
Pythonã®gzip
ã¢ãžã¥ãŒã«ã¯ãæšæºã®GNU zipãŠãŒãã£ãªãã£ãšã®äºææ§ãæã€ããã«èšèšãããŠããŸããããã¯ãPythonã§å§çž®ããããã¡ã€ã«ã¯gzip
ã³ãã³ãã©ã€ã³ããŒã«ã§è§£åã§ãããã®éãå¯èœã§ããããšãæå³ããŸãããã®çžäºéçšæ§ã¯ãããŒã¿åŠçã«ç°ãªãããŒã«ã䜿çšããå¯èœæ§ã®ããããŸããŸãªã³ã³ããŒãã³ããããã°ããŒãã«ã·ã¹ãã ã«ãšã£ãŠéèŠã§ãã
Python Gzipã®ã°ããŒãã«ã¢ããªã±ãŒã·ã§ã³
Pythonã®gzip
ã¢ãžã¥ãŒã«ã®å¹ççã§å
ç¢ãªæ§è³ªã«ãããå¹
åºãã°ããŒãã«ã¢ããªã±ãŒã·ã§ã³ã«äžå¯æ¬ ã§ãïŒ
- WebãµãŒããŒãšAPIïŒHTTPå¿çïŒããšãã°ãHTTP Content-EncodingïŒgzipã䜿çšïŒãå§çž®ããŠã垯åå¹ ã®äœ¿çšéãåæžããäžçäžã®ãŠãŒã¶ãŒã®ããŒãæéãæ¹åããŸããFlaskãDjangoãªã©ã®ãã¬ãŒã ã¯ãŒã¯ã¯ãããããµããŒãããããã«æ§æã§ããŸãã
- ããŒã¿ã¢ãŒã«ã€ããšããã¯ã¢ããïŒãã£ã¹ã¯ã¹ããŒã¹ãç¯çŽããããã¯ã¢ããæéãççž®ããããã«ã倧ããªãã°ãã¡ã€ã«ãããŒã¿ããŒã¹ãã³ãããŸãã¯éèŠãªããŒã¿ãä¿åããåã«å§çž®ããŸããããã¯ãåºç¯ãªããŒã¿ã¹ãã¬ãŒãžã®ããŒãºãæã€ã°ããŒãã«ã«äºæ¥ãå±éããçµç¹ã«ãšã£ãŠéèŠã§ãã
- ãã°ãã¡ã€ã«éçŽïŒç°ãªãå°åã«ãµãŒããŒããã忣ã·ã¹ãã ã§ã¯ããã°ã¯äžå çã«åéãããããšããããããŸãããããã®ãã°ãéä¿¡åã«å§çž®ãããšããããã¯ãŒã¯ãã©ãã£ãã¯ã®ã³ã¹ããå€§å¹ ã«åæžãããåã蟌ã¿ãé«éåãããŸãã
- ããŒã¿è»¢éãããã³ã«ïŒæœåšçã«ä¿¡é Œã§ããªããããã¯ãŒã¯ãŸãã¯äœåž¯åå¹ ãããã¯ãŒã¯ãä»ããå¹ççãªããŒã¿è»¢éãå¿ èŠãšããã«ã¹ã¿ã ãããã³ã«ã®å®è£ ãGzipã䜿çšãããšãããå€ãã®ããŒã¿ãããçãæéã§éä¿¡ã§ããŸãã
- ç§åŠèšç®ãšããŒã¿ãµã€ãšã³ã¹ïŒå§çž®åœ¢åŒïŒ
.csv.gz
ã.json.gz
ãªã©ïŒã§å€§ããªããŒã¿ã»ããïŒã»ã³ãµãŒã®èªã¿åãå€ãã·ãã¥ã¬ãŒã·ã§ã³åºåãªã©ïŒãä¿åããããšã¯æšæºçãªæ¹æ³ã§ããPandasã®ãããªã©ã€ãã©ãªã¯ãããããçŽæ¥èªã¿åãããšãã§ããŸãã - ã¯ã©ãŠãã¹ãã¬ãŒãžãšCDNã®çµ±åïŒå€ãã®ã¯ã©ãŠãã¹ãã¬ãŒãžãµãŒãã¹ãšã³ã³ãã³ãé ä¿¡ãããã¯ãŒã¯ïŒCDNïŒã¯ãgzipå§çž®ãéçã¢ã»ããã«æŽ»çšããŠãäžçäžã®ãšã³ããŠãŒã¶ãŒãžã®é ä¿¡ããã©ãŒãã³ã¹ãåäžãããŠããŸãã
- åœéåïŒi18nïŒãšããŒã«ãªãŒãŒã·ã§ã³ïŒl10nïŒïŒèšèªãã¡ã€ã«ãçŽæ¥å§çž®ããããã§ã¯ãããŸãããã翻蚳ãªãœãŒã¹ãæ§æãã¡ã€ã«ãããŠã³ããŒãããããã®å¹ççãªããŒã¿è»¢éã¯ãgzipã®æ©æµãåããŸãã
åœéçãªèæ ®äºé ïŒ
- 垯åå¹ ã®å€åïŒã€ã³ã¿ãŒãããã€ã³ãã©ã¹ãã©ã¯ãã£ã¯å°åã«ãã£ãŠå€§ããç°ãªããŸããGzipã¯ã垯åå¹ ãå¶éãããŠããå°åã§ãŠãŒã¶ãŒã蚱容ã§ããããã©ãŒãã³ã¹ã確ä¿ããããã«äžå¯æ¬ ã§ãã
- ããŒã¿ã®äž»æš©ãšã¹ãã¬ãŒãžïŒå§çž®ã«ãã£ãŠããŒã¿éãåæžãããšãã¹ãã¬ãŒãžã³ã¹ãã管çããããŒã¿éãšä¿æã«é¢ããèŠå¶ãéµå®ããã®ã«åœ¹ç«ã¡ãŸãã
- ã¿ã€ã ãŸãŒã³ãšåŠçïŒgzipã䜿çšããã¹ããªãŒã åŠçã«ãããåäžãã€ã³ãã§åŠçãŸãã¯ã¹ãã¬ãŒãžãªãœãŒã¹ãå§åããããšãªããè€æ°ã®ã¿ã€ã ãŸãŒã³ã§çæãããããŒã¿ãå¹ççã«åŠçã§ããŸãã
- é貚ãšã³ã¹ãïŒããŒã¿è»¢ééã®åæžã¯ãã°ããŒãã«ãªãã¬ãŒã·ã§ã³ã«ãšã£ãŠéèŠãªèŠçŽ ã§ãã垯åå¹ ã³ã¹ãã®åæžã«çŽæ¥ã€ãªãããŸãã
Python Gzipã䜿çšããããã®ãã¹ããã©ã¯ãã£ã¹
with
ã¹ããŒãã¡ã³ãã䜿çšããïŒãã¡ã€ã«ãé©åã«éãããããªãœãŒã¹ãè§£æŸãããããã«ãåžžã«with gzip.GzipFile(...)
ãŸãã¯with gzip.open(...)
ã䜿çšããŠãã ããã- ãã€ããåŠçããïŒgzipã¯ãã€ãã§åäœããããšãå¿ããªãã§ãã ãããæååãæäœããå Žåã¯ãå§çž®åã«ãã€ãã«ãšã³ã³ãŒãããè§£ååŸã«ãã³ãŒãããŸããããã¹ãã¢ãŒãã®
gzip.open()
ã¯ãããç°¡çŽ åããŸãã - 倧ããªããŒã¿ãã¹ããªãŒã ããïŒäœ¿çšå¯èœãªã¡ã¢ãªãŒããã倧ãããã¡ã€ã«ã®å Žåã¯ãããŒã¿ã»ããå šäœãããŒãããããšããã®ã§ã¯ãªããåžžã«ãã£ã³ã¯ã¢ãããŒãïŒããå°ããªãããã¯ã§ã®èªã¿åããšæžã蟌ã¿ïŒã䜿çšããŠãã ããã
- ãšã©ãŒåŠçïŒç¹ã«
gzip.BadGzipFile
ã®ãšã©ãŒåŠçãå ç¢ã«å®è£ ããã¹ããªãŒãã³ã°ã¢ããªã±ãŒã·ã§ã³ã®ãããã¯ãŒã¯ãšã©ãŒãæ€èšããŠãã ããã - é©åãªå§çž®ã¬ãã«ãéžæããïŒå§çž®çãšããã©ãŒãã³ã¹ã®ããŒãºã®ãã©ã³ã¹ãåããŸããããã©ãŒãã³ã¹ãéèŠãªå Žåã¯ã詊ããŠãã ããã
.gz
æ¡åŒµåã䜿çšããïŒã¢ãžã¥ãŒã«ã§å³å¯ã«èŠæ±ãããŠããããã§ã¯ãããŸãããã.gz
æ¡åŒµåã䜿çšããããšã¯ãgzipå§çž®ãã¡ã€ã«ãèå¥ããã®ã«åœ¹ç«ã€æšæºçãªèŠåã§ãã- ããã¹ããšãã€ããªãŒïŒçã®ãã€ãã¹ããªãŒã ã«ã¯ãã€ããªãŒã¢ãŒãïŒ
'rb'
ã'wb'
ïŒããã€äœ¿çšããæååãæ±ãå Žåã¯ããã¹ãã¢ãŒãïŒ'rt'
ã'wt'
ïŒããã€äœ¿çšããããçè§£ããæ£ãããšã³ã³ãŒããæå®ããããã«ããŸãã
çµè«
Pythonã®gzip
ã¢ãžã¥ãŒã«ã¯ããããã容éã§ããŒã¿ãæ±ãéçºè
ã«ãšã£ãŠäžå¯æ¬ ãªããŒã«ã§ããã¹ããªãŒã å§çž®ãšè§£åãå¹ççã«å®è¡ã§ãããããããŒã¿è»¢éãã¹ãã¬ãŒãžãåŠçãåŠçããã¢ããªã±ãŒã·ã§ã³ãæé©åããããã®åºç€ãç¹ã«ã°ããŒãã«ã¹ã±ãŒã«ã§ã®åºç€ãšãªããŸããgzip.GzipFile
ãgzip.open()
ãããã³ãŠãŒãã£ãªãã£é¢æ°ã®ãã¥ã¢ã³ã¹ãçè§£ããããšã§ãPythonã¢ããªã±ãŒã·ã§ã³ã®ããã©ãŒãã³ã¹ã倧å¹
ã«åäžããããªãœãŒã¹ãããããªã³ããåæžããåœéçãªãªãŒãã£ãšã³ã¹ã®å€æ§ãªããŒãºã«å¯Ÿå¿ã§ããŸãã
ãã©ãã£ãã¯ã®å€ãWebãµãŒãã¹ãæ§ç¯ããå Žåã§ããç§åŠç ç©¶ã®ããã«å€§ããªããŒã¿ã»ããã管çããå Žåã§ããããŒã«ã«ãã¡ã€ã«ã¹ãã¬ãŒãžãæé©åããå Žåã§ããPythonã®gzip
ã¢ãžã¥ãŒã«ã䜿çšããã¹ããªãŒã å§çž®ãšè§£åã®ååã¯åœ¹ã«ç«ã¡ãŸãããããã®ããŒã«ã掻çšããŠãã°ããŒãã«ãªããžã¿ã«ã©ã³ãã¹ã±ãŒãåãã®ãããå¹ççã§ã¹ã±ãŒã©ãã«ã§è²»çšå¯Ÿå¹æã®é«ããœãªã¥ãŒã·ã§ã³ãæ§ç¯ããŠãã ããã